Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] MGMT-19741: Add Nutanix support #2550

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

eliorerz
Copy link

/cc @eliorerz

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 23, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 23, 2025

@eliorerz: This pull request references MGMT-19741 which is a valid jira issue.

In response to this:

/cc @eliorerz

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2025
Copy link
Contributor

openshift-ci bot commented Jan 23, 2025

@eliorerz: GitHub didn't allow me to request PR reviews from the following users: eliorerz.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @eliorerz

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

openshift-ci bot commented Jan 23, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: eliorerz
Once this PR has been reviewed and has the lgtm label, please assign dlom for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@eliorerz eliorerz force-pushed the NO-ISSUE-add-nutanix-support branch 2 times, most recently from 89c5e08 to f254bcb Compare January 23, 2025 19:12
Copy link

codecov bot commented Jan 23, 2025

Codecov Report

Attention: Patch coverage is 0% with 532 lines in your changes missing coverage. Please review.

Project coverage is 48.99%. Comparing base (5d3f4d7) to head (fa1957c).

Files with missing lines Patch % Lines
pkg/controller/machinepool/nutanixactuator.go 0.00% 153 Missing ⚠️
pkg/clusterresource/nutanix.go 0.00% 104 Missing ⚠️
contrib/pkg/deprovision/nutanix.go 0.00% 59 Missing ⚠️
contrib/pkg/createcluster/create.go 0.00% 53 Missing ⚠️
...g/controller/clusterpool/clusterpool_controller.go 0.00% 28 Missing ⚠️
pkg/install/generate.go 0.00% 28 Missing ⚠️
pkg/nutanixclient/client.go 0.00% 27 Missing ⚠️
pkg/installmanager/installmanager.go 0.00% 23 Missing ⚠️
.../v1/clusterdeployment_validating_admission_hook.go 0.00% 12 Missing and 1 partial ⚠️
...s/hive/v1/machinepool_validating_admission_hook.go 0.00% 11 Missing and 1 partial ⚠️
... and 6 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2550      +/-   ##
==========================================
- Coverage   49.78%   48.99%   -0.79%     
==========================================
  Files         281      286       +5     
  Lines       33029    33560     +531     
==========================================
  Hits        16444    16444              
- Misses      15252    15781     +529     
- Partials     1333     1335       +2     
Files with missing lines Coverage Δ
pkg/constants/constants.go 100.00% <ø> (ø)
...shift/hive/apis/hive/v1/clusterdeployment_types.go 0.00% <ø> (ø)
...hift/hive/apis/hive/v1/clusterdeprovision_types.go 0.00% <ø> (ø)
...m/openshift/hive/apis/hive/v1/machinepool_types.go 0.00% <ø> (ø)
contrib/pkg/deprovision/deprovision.go 0.00% <0.00%> (ø)
...g/controller/machinepool/machinepool_controller.go 56.57% <0.00%> (-0.14%) ⬇️
pkg/controller/utils/credentials.go 0.00% <0.00%> (ø)
...oller/clusterdeployment/installconfigvalidation.go 84.84% <0.00%> (-15.16%) ⬇️
.../clusterdeployment/clusterdeployment_controller.go 66.62% <0.00%> (-0.41%) ⬇️
contrib/pkg/utils/nutanix/nutanix.go 0.00% <0.00%> (ø)
... and 10 more

Copy link
Member

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't make it very far today. I'll keep hammering away at it next week.

@@ -0,0 +1,55 @@
package nutanix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding a nutanix SME to help review this and the CD Platform file, as I don't have the context to know whether these fields are correct and necessary/sufficient.

Also wouldn't be a terrible idea to get an API reviewer's eyes on this. I'm not an expert, but I think it may be best practice to include a +optional/+required tag even when it's the default; and it may be appropriate for some of the fields to be pointers rather than scalars if "unspecified" has different semantics than the nil value.

type MachinePool struct {

// NumVcpusPerSocket is the total number of virtual processor cores (per socket) to assign a vm.
NumVcpusPerSocket int64 `json:"vcpus"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also get a linter warning about this, but unless there's a good reason to do otherwise (e.g. we're copy-pasting an upstream type def (in which case we should look into just importing it)) the json names should match the go field names (modulo case mapping rules).

Several fields affected in this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self: It's weird that this (and others for platforms hive doesn't support) have been pulled in via this PR. Need to investigate further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because the new types are using defs from openshift/api/machine/v1 (which, annoyingly, pulls in machine/v1beta1). This is definitely preferable to copying in those defs, though it's disappointing that it drags in a bunch of stuff we don't need.

Size-wise:

efried@efried-thinkpadp16vgen1:~/go/src/github.com/openshift/hive$ git show --name-status | awk '/^A.*vendor/ {print $2}' | xargs du -c
4	apis/vendor/github.com/openshift/api/machine/v1/Makefile
4	apis/vendor/github.com/openshift/api/machine/v1/common.go
4	apis/vendor/github.com/openshift/api/machine/v1/doc.go
4	apis/vendor/github.com/openshift/api/machine/v1/register.go
20	apis/vendor/github.com/openshift/api/machine/v1/types_alibabaprovider.go
4	apis/vendor/github.com/openshift/api/machine/v1/types_aws.go
28	apis/vendor/github.com/openshift/api/machine/v1/types_controlplanemachineset.go
16	apis/vendor/github.com/openshift/api/machine/v1/types_nutanixprovider.go
12	apis/vendor/github.com/openshift/api/machine/v1/types_powervsprovider.go
36	apis/vendor/github.com/openshift/api/machine/v1/zz_generated.deepcopy.go
4	apis/vendor/github.com/openshift/api/machine/v1/zz_generated.featuregated-crd-manifests.yaml
48	apis/vendor/github.com/openshift/api/machine/v1/zz_generated.swagger_doc_generated.go
4	apis/vendor/github.com/openshift/api/machine/v1beta1/Makefile
4	apis/vendor/github.com/openshift/api/machine/v1beta1/doc.go
4	apis/vendor/github.com/openshift/api/machine/v1beta1/register.go
16	apis/vendor/github.com/openshift/api/machine/v1beta1/types_awsprovider.go
32	apis/vendor/github.com/openshift/api/machine/v1beta1/types_azureprovider.go
20	apis/vendor/github.com/openshift/api/machine/v1beta1/types_gcpprovider.go
24	apis/vendor/github.com/openshift/api/machine/v1beta1/types_machine.go
8	apis/vendor/github.com/openshift/api/machine/v1beta1/types_machinehealthcheck.go
12	apis/vendor/github.com/openshift/api/machine/v1beta1/types_machineset.go
16	apis/vendor/github.com/openshift/api/machine/v1beta1/types_provider.go
12	apis/vendor/github.com/openshift/api/machine/v1beta1/types_vsphereprovider.go
52	apis/vendor/github.com/openshift/api/machine/v1beta1/zz_generated.deepcopy.go
4	apis/vendor/github.com/openshift/api/machine/v1beta1/zz_generated.featuregated-crd-manifests.yaml
80	apis/vendor/github.com/openshift/api/machine/v1beta1/zz_generated.swagger_doc_generated.go
4	vendor/github.com/openshift/hive/apis/hive/v1/nutanix/doc.go
4	vendor/github.com/openshift/hive/apis/hive/v1/nutanix/machinepools.go
4	vendor/github.com/openshift/hive/apis/hive/v1/nutanix/platform.go
4	vendor/github.com/openshift/hive/apis/hive/v1/nutanix/zz_generated.deepcopy.go
4	vendor/github.com/openshift/installer/pkg/destroy/nutanix/OWNERS
4	vendor/github.com/openshift/installer/pkg/destroy/nutanix/doc.go
8	vendor/github.com/openshift/installer/pkg/destroy/nutanix/nutanix.go
4	vendor/github.com/openshift/installer/pkg/destroy/nutanix/register.go
508	total

So we're adding about half a meg to the repo. Most of it (488K) is under apis/, which is where our downstreams take a hit. But it's under openshift/api, which seems acceptable -- particularly considering it's likely they'll be importing that anyway.

Copy link
Member

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously you're not quite finished yet, but this is an awesome start.

In large part, I'm going to be avoiding in-depth review of the platform-specific esoterica of this work, assuming that you and QE will be testing the functionality, and if it works, it works :)

One specific area I want to bring up: You have some code in here that would exist exclusively to support clusterpools, but you may not necessarily want to do that right away. It's the kind of thing that could be added later if you were looking for an opportunity to split this effort into smaller chunks.

Similarly, a hibernation actuator is absent from this PR -- which is fine as it's also something that lends itself to a separate chunk of work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because the new types are using defs from openshift/api/machine/v1 (which, annoyingly, pulls in machine/v1beta1). This is definitely preferable to copying in those defs, though it's disappointing that it drags in a bunch of stuff we don't need.

Size-wise:

efried@efried-thinkpadp16vgen1:~/go/src/github.com/openshift/hive$ git show --name-status | awk '/^A.*vendor/ {print $2}' | xargs du -c
4	apis/vendor/github.com/openshift/api/machine/v1/Makefile
4	apis/vendor/github.com/openshift/api/machine/v1/common.go
4	apis/vendor/github.com/openshift/api/machine/v1/doc.go
4	apis/vendor/github.com/openshift/api/machine/v1/register.go
20	apis/vendor/github.com/openshift/api/machine/v1/types_alibabaprovider.go
4	apis/vendor/github.com/openshift/api/machine/v1/types_aws.go
28	apis/vendor/github.com/openshift/api/machine/v1/types_controlplanemachineset.go
16	apis/vendor/github.com/openshift/api/machine/v1/types_nutanixprovider.go
12	apis/vendor/github.com/openshift/api/machine/v1/types_powervsprovider.go
36	apis/vendor/github.com/openshift/api/machine/v1/zz_generated.deepcopy.go
4	apis/vendor/github.com/openshift/api/machine/v1/zz_generated.featuregated-crd-manifests.yaml
48	apis/vendor/github.com/openshift/api/machine/v1/zz_generated.swagger_doc_generated.go
4	apis/vendor/github.com/openshift/api/machine/v1beta1/Makefile
4	apis/vendor/github.com/openshift/api/machine/v1beta1/doc.go
4	apis/vendor/github.com/openshift/api/machine/v1beta1/register.go
16	apis/vendor/github.com/openshift/api/machine/v1beta1/types_awsprovider.go
32	apis/vendor/github.com/openshift/api/machine/v1beta1/types_azureprovider.go
20	apis/vendor/github.com/openshift/api/machine/v1beta1/types_gcpprovider.go
24	apis/vendor/github.com/openshift/api/machine/v1beta1/types_machine.go
8	apis/vendor/github.com/openshift/api/machine/v1beta1/types_machinehealthcheck.go
12	apis/vendor/github.com/openshift/api/machine/v1beta1/types_machineset.go
16	apis/vendor/github.com/openshift/api/machine/v1beta1/types_provider.go
12	apis/vendor/github.com/openshift/api/machine/v1beta1/types_vsphereprovider.go
52	apis/vendor/github.com/openshift/api/machine/v1beta1/zz_generated.deepcopy.go
4	apis/vendor/github.com/openshift/api/machine/v1beta1/zz_generated.featuregated-crd-manifests.yaml
80	apis/vendor/github.com/openshift/api/machine/v1beta1/zz_generated.swagger_doc_generated.go
4	vendor/github.com/openshift/hive/apis/hive/v1/nutanix/doc.go
4	vendor/github.com/openshift/hive/apis/hive/v1/nutanix/machinepools.go
4	vendor/github.com/openshift/hive/apis/hive/v1/nutanix/platform.go
4	vendor/github.com/openshift/hive/apis/hive/v1/nutanix/zz_generated.deepcopy.go
4	vendor/github.com/openshift/installer/pkg/destroy/nutanix/OWNERS
4	vendor/github.com/openshift/installer/pkg/destroy/nutanix/doc.go
8	vendor/github.com/openshift/installer/pkg/destroy/nutanix/nutanix.go
4	vendor/github.com/openshift/installer/pkg/destroy/nutanix/register.go
508	total

So we're adding about half a meg to the repo. Most of it (488K) is under apis/, which is where our downstreams take a hit. But it's under openshift/api, which seems acceptable -- particularly considering it's likely they'll be importing that anyway.

@@ -76,6 +61,23 @@ import (
k8slabels "github.com/openshift/hive/pkg/util/labels"
"github.com/openshift/hive/pkg/util/scheme"
yamlutils "github.com/openshift/hive/pkg/util/yaml"
"github.com/openshift/installer/pkg/destroy/aws"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This delta seems unnecessary. We usually deliberately keep the hive imports in their own section.

I'm curious if this was done by some tool you're using that reorganizes imports. I use the one in VSCode, which wouldn't do this unless I removed the newline between the sections.

I'm not asking you to change it back (though @suhanime might :P )

Comment on lines 44 to 55
//type BootType string

//const (
// // VMs Disk Types
// DiskDeviceTypeCDROM DiskDeviceType = "CDROM"
// DiskDeviceTypeDISK DiskDeviceType = "DISK"
//
// // VMs Boot Types
// NutanixBootTypeUEFI BootType = "UEFI"
// NutanixBootTypeLegacy BootType = "LEGACY"
// NutanixBootTypeSecureBoot BootType = "SECURE_BOOT"
//)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to clean this up.

return nil, fmt.Errorf("no %s env var set, cannot proceed", constants.NutanixUsernameEnvVar)
}

nutanixPassword := os.Getenv(constants.NutanixPasswordEnvVar)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there's precedent for grabbing these creds via env, but it would be preferable to use a file if that's feasible. I'm not asking you to invent a config file format, but does nutanix already have one that it supports?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with one but we can store that in a JSON or yaml formats. It will be easy to read and to maintain

@@ -1,17 +1,18 @@
package clusterdeployment

import (
hivev1aws "github.com/openshift/hive/apis/hive/v1/aws"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely don't agree with this delta, as it

  • splits up hive imports
  • gloms hive imports with core imports.

@@ -1259,6 +1259,10 @@ func (r *ReconcileMachinePool) createActuator(
logger log.FieldLogger,
) (Actuator, error) {
switch {
case cd.Spec.Platform.Nutanix != nil:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we put this case in its alpha-ordered spot in the switch?

actuator := &NutanixActuator{
client: client,
scheme: scheme,
//nutanixClient: nutanixClient,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk about whether you're actually going to need this. In general, the major cloud bit -- managing VMs -- is done way down the line by MAO, using the cloud creds on the spoke, operating on the MachineSets we generate. Exceptions exist: for example, in GCP and IBMCloud we need to discover zone names. But by default I would say YAGNI.

Comment on lines 103 to 124
installerMachineSets, err := installvsphere.MachineSets(
cd.Spec.ClusterMetadata.InfraID,
ic,
computePool,
// With FailureDomains[0].Topology.Template set, this param is ignored.
"HIVE_BUG!",
workerRole,
workerUserDataName,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copypasta, looks like this still needs some work.

When we get down to it though, I think we should take the nutanix installer team up on their offer to help out by reviewing this func.

The general principle is that our code here needs to dummy up the inputs to the platform-specific MachineSets generator. We can not rely on having access to the original install-config.yaml for the cluster, which is sometimes annoying and awkward, forcing us to either assume certain values or duplicate them in the (hive) MachinePool CRD. Since we're making this actuator from scratch, we should take the opportunity to evaluate those things carefully; else we can end up painting ourselves into a corner where e.g. we have to support bogus/deprecated/obsolete fields "forever".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file will want code for the uninstaller as well. Examples for other platforms are referenced in a switch starting around L519. And that'll correspond to a hiveutil deprovision nutanix subcommand, with code under contrib/pkg/deprovision, plugged into deprovision.go.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 6, 2025
@eliorerz eliorerz force-pushed the NO-ISSUE-add-nutanix-support branch 14 times, most recently from 512216c to 6127f44 Compare February 14, 2025 08:34
@eliorerz eliorerz force-pushed the NO-ISSUE-add-nutanix-support branch 2 times, most recently from 90385a3 to 507c803 Compare February 17, 2025 10:21
@eliorerz eliorerz force-pushed the NO-ISSUE-add-nutanix-support branch from 507c803 to 3584490 Compare February 17, 2025 12:07
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 17, 2025
Copy link
Contributor

openshift-ci bot commented Feb 17, 2025

@eliorerz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security fa1957c link true /test security
ci/prow/e2e-vsphere fa1957c link true /test e2e-vsphere

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants